24 research outputs found
Quantitative and evolutionary global analysis of enzyme reaction mechanisms
The most widely used classification system describing enzyme-catalysed reactions
is the Enzyme Commission (EC) number. Understanding enzyme
function is important for both fundamental scientific and pharmaceutical
reasons. The EC classification is essentially unrelated to the reaction mechanism.
In this work we address two important questions related to enzyme
function diversity. First, to investigate the relationship between the reaction
mechanisms as described in the MACiE (Mechanism, Annotation,
and Classification in Enzymes) database and the main top-level class of the
EC classification. Second, how well these enzymes biocatalysis are adapted
in nature.
In this thesis, we have retrieved 335 enzyme reactions from the MACiE
database. We consider two ways of encoding the reaction mechanism in
descriptors, and three approaches that encode only the overall chemical
reaction.
To proceed through my work, we first develop a basic model to cluster
the enzymatic reactions. Global study of enzyme reaction mechanism
may provide important insights for better understanding of the diversity of
chemical reactions of enzymes. Clustering analysis in such research is very
common practice. Clustering algorithms suffer from various issues, such as
requiring determination of the input parameters and stopping criteria, and
very often a need to specify the number of clusters in advance.
Using several well known metrics, we tried to optimize the clustering
outputs for each of the algorithms, with equivocal results that suggested the
existence of between two and over a hundred clusters. This motivated us to
design and implement our algorithm, PFClust (Parameter-Free Clustering),
where no prior information is required to determine the number of cluster. The analysis highlights the structure of the enzyme overall and mechanistic
reaction. This suggests that mechanistic similarity can influence approaches
for function prediction and automatic annotation of newly discovered protein
and gene sequences.
We then develop and evaluate the method for enzyme function prediction
using machine learning methods. Our results suggest that pairs of similar
enzyme reactions tend to proceed by different mechanisms. The machine
learning method needs only chemoinformatics descriptors as an input and
is applicable for regression analysis.
The last phase of this work is to test the evolution of chemical mechanisms
mapped onto ancestral enzymes. This domain occurrence and abundance
in modern proteins has showed that the / architecture is probably
the oldest fold design. These observations have important implications for
the origins of biochemistry and for exploring structure-function relationships.
Over half of the known mechanisms are introduced before architectural
diversification over the evolutionary time. The other halves of the mechanisms
are invented gradually over the evolutionary timeline just after organismal
diversification. Moreover, many common mechanisms includes fundamental
building blocks of enzyme chemistry were found to be associated
with the ancestral fold
BDNF: mRNA expression in urine cells of patients with chronic kidney disease and its role in kidney function
Podocyte loss and changes to the complex morphology are major causes of chronic kidney disease (CKD). As the incidence is continuously increasing over the last decades without sufficient treatment, it is important to find predicting biomarkers. Therefore, we measured urinary mRNA levels of podocyte genes NPHS1, NPHS2, PODXL and BDNF, KIM-1, CTSL by qRT-PCR of 120 CKD patients. We showed a strong correlation between BDNF and the kidney injury marker KIM-1, which were also correlated with NPHS1, suggesting podocytes as a contributing source. In human biopsies, BDNF was localized in the cell body and major processes of podocytes. In glomeruli of diabetic nephropathy patients, we found a strong BDNF signal in the remaining podocytes. An inhibition of the BDNF receptor TrkB resulted in enhanced podocyte dedifferentiation. The knockdown of the orthologue resulted in pericardial oedema formation and lowered viability of zebrafish larvae. We found an enlarged Bowman's space, dilated glomerular capillaries, podocyte loss and an impaired glomerular filtration. We demonstrated that BDNF is essential for glomerular development, morphology and function and the expression of BDNF and KIM-1 is highly correlated in urine cells of CKD patients. Therefore, BDNF mRNA in urine cells could serve as a potential CKD biomarker
Is EC class predictable from reaction mechanism?
We thank the Scottish Universities Life Sciences Alliance (SULSA) and the Scottish Overseas Research Student Awards Scheme of the Scottish Funding Council (SFC) for financial support.Background: We investigate the relationships between the EC (Enzyme Commission) class, the associated chemical reaction, and the reaction mechanism by building predictive models using Support Vector Machine (SVM), Random Forest (RF) and k-Nearest Neighbours (kNN). We consider two ways of encoding the reaction mechanism in descriptors, and also three approaches that encode only the overall chemical reaction. Both cross-validation and also an external test set are used. Results: The three descriptor sets encoding overall chemical transformation perform better than the two descriptions of mechanism. SVM and RF models perform comparably well; kNN is less successful. Oxidoreductases and hydrolases are relatively well predicted by all types of descriptor; isomerases are well predicted by overall reaction descriptors but not by mechanistic ones. Conclusions: Our results suggest that pairs of similar enzyme reactions tend to proceed by different mechanisms. Oxidoreductases, hydrolases, and to some extent isomerases and ligases, have clear chemical signatures, making them easier to predict than transferases and lyases. We find evidence that isomerases as a class are notably mechanistically diverse and that their one shared property, of substrate and product being isomers, can arise in various unrelated ways. The performance of the different machine learning algorithms is in line with many cheminformatics applications, with SVM and RF being roughly equally effective. kNN is less successful, given the role that non-local information plays in successful classification. We note also that, despite a lack of clarity in the literature, EC number prediction is not a single problem; the challenge of predicting protein function from available sequence data is quite different from assigning an EC classification from a cheminformatics representation of a reaction.Publisher PDFPeer reviewe
The natural history of biocatalytic mechanisms
JBOM and NN thank the Scottish Universities Life Science Alliance (SULSA) http://www.sulsa.ac.uk/ and Scottish Funding Council (SFC) http://www.sfc.ac.uk/ for financial support. JBOM thanks the Biotechnology and Biological Sciences Research Council (BBSRC) http://www.bbsrc.ac.uk/ for financial support through grant BB/I00596X/1 and GCA the National Science Foundation (OISE-1132791) http://www.nsf.gov/ and the United States Department of Agriculture (ILLU-802-909 and ILLU-483-625) http://www.csrees.usda.gov/ for financial support. EaStCHEM http://www.eastchem.ac.uk/ provided access to the ECRF computing facility.Phylogenomic analysis of the occurrence and abundance of protein domains in proteomes has recently showed that the α/β architecture is probably the oldest fold design. This holds important implications for the origins of biochemistry. Here we explore structure-function relationships addressing the use of chemical mechanisms by ancestral enzymes. We test the hypothesis that the oldest folds used the most mechanisms. We start by tracing biocatalytic mechanisms operating in metabolic enzymes along a phylogenetic timeline of the first appearance of homologous superfamilies of protein domain structures from CATH. A total of 335 enzyme reactions were retrieved from MACiE and were mapped over fold age. We define a mechanistic step type as one of the 51 mechanistic annotations given in MACiE, and each step of each of the 335 mechanisms was described using one or more of these annotations. We find that the first two folds, the P-loop containing nucleotide triphosphate hydrolase and the NAD(P)-binding Rossmann-like homologous superfamilies, were α/β architectures responsible for introducing 35% (18/51) of the known mechanistic step types. We find that these two oldest structures in the phylogenomic analysis of protein domains introduced many mechanistic step types that were later combinatorially spread in catalytic history. The most common mechanistic step types included fundamental building blocks of enzyme chemistry: “Proton transfer,” “Bimolecular nucleophilic addition,” “Bimolecular nucleophilic substitution,” and “Unimolecular elimination by the conjugate base.” They were associated with the most ancestral fold structure typical of P-loop containing nucleotide triphosphate hydrolases. Over half of the mechanistic step types were introduced in the evolutionary timeline before the appearance of structures specific to diversified organisms, during a period of architectural diversification. The other half unfolded gradually after organismal diversification and during a period that spanned ~2 billion years of evolutionary history.Publisher PDFPeer reviewe
The history of biocatalytic mechanisms.
<p>The heat map describes the distribution of presence (red) and absence (yellow) of mechanism step types (y-axis) over fold age (x-axis). Rows of the heat map (mechanisms) are ordered vertically according to the first appearance of the step type in time, with the oldest at the top. The row sidebars at the top of the heat map are used to describe the number of MACiE entries and CATH H-level domain structures (annotated as number of folds) appearing at each fold age, and presence of top-level EC classes that are associated with these H-level structures (see color key). The x-axis scale reflects the different <i>nd</i> values found in our dataset, arranged from the oldest on the left to the youngest on the right. Every unique <i>nd</i> value forms a separate column. The non-linear scale is defined by the number of unique <i>nd</i> values falling in each interval of <i>nd</i>. There are many distinct <i>nd</i> values between 0.0 and 0.3 found in our dataset, so the scale is expanded in this region. There are few distinct <i>nd</i> values between 0.7 and 1.0, so the scale is very condensed in that region. Geological time is taken to be approximately linear with <i>nd</i>, where <i>nd</i> = 0 represents the origin of the protein world approximately 3.8 billion years ago and <i>nd</i> = 1 corresponds to the present <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003642#pcbi.1003642-Wang1" target="_blank">[4]</a>.</p
MACiE enzymes for purine metabolism.
<p>Table columns are: MACiE code, Enzyme name, EC number, Purine metabolic subnetwork <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003642#pcbi.1003642-CaetanoAnolls6" target="_blank">[41]</a>, PDB code, CATH H-level Structure, nd value and mechanistic step types.</p
The history of biocatalytic mechanisms.
<p>The heat map describes the distribution of presence (red) and absence (yellow) of mechanism step types (y-axis) over fold age (x-axis). Rows of the heat map (mechanisms) are ordered vertically according to the first appearance of the step type in time, with the oldest at the top. The row sidebars at the top of the heat map are used to describe the number of MACiE entries and CATH H-level domain structures (annotated as number of folds) appearing at each fold age, and presence of top-level EC classes that are associated with these H-level structures (see color key). The x-axis scale reflects the different <i>nd</i> values found in our dataset, arranged from the oldest on the left to the youngest on the right. Every unique <i>nd</i> value forms a separate column. The non-linear scale is defined by the number of unique <i>nd</i> values falling in each interval of <i>nd</i>. There are many distinct <i>nd</i> values between 0.0 and 0.3 found in our dataset, so the scale is expanded in this region. There are few distinct <i>nd</i> values between 0.7 and 1.0, so the scale is very condensed in that region. Geological time is taken to be approximately linear with <i>nd</i>, where <i>nd</i> = 0 represents the origin of the protein world approximately 3.8 billion years ago and <i>nd</i> = 1 corresponds to the present <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003642#pcbi.1003642-Wang1" target="_blank">[4]</a>.</p
Pattern 133, the mechanistic step types associated with CATH 3.20.20.70, Aldolase class I.
<p>Pattern 133, the mechanistic step types associated with CATH 3.20.20.70, Aldolase class I.</p
Heat map representing the similarity of mechanistic step types utilised by the H-level structures.
<p>For this we have calculated the Jaccard similarity scores. Here the x and y axes in the plot are ordered using a hierarchical clustering algorithm in which the two most similar data points are linked together at each iteration. The colors of the heatmap represent the similarity scores where yellow suggests low or no (when 0) similarity and white (1) means that identical combinations of mechanistic steps are shared between two H-level structures. The top left corner represents the color key for the similarity scores and the distribution of the similarity scores.</p